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Abstract. Many current recognition systems use constrained search to locate objects 
in cluttered environments. Earlier analysis of one class of methods has shown that the 
expected amount of search is quadratic in the number of model and data features, if all 
the data is known to come from a single object, but is exponential when spurious data is 
included. To overcome this, many methods terminate search once an interpretation that 
is "good enough" is found. In this paper, we formally examine the combinatorics of this 
approach, showing that choosing correct termination procedures can dramatically reduce 
the search. In particular, we provide conditions on the object model and the scene clutter 
such that the expected search is polynomial. The analytic results are shown to be in 
agreement with empirical data for cluttered object recognition. 
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Most current approaches to the recognition and localization of objects from 
noisy sensor data in cluttered environments utilize a search process to find solutions 
to the problem. Typically, this search finds interpretations of the data by identifying 
pairings of data features to model features that are consistent with a rigid trans- 
formation of the object model into sensor coordinates. There are many variations 
on this approach, including hypothesize and test methods [e.g. Lowe 1985, 1987, 
Ayache k Faugeras 1986, Huttenlocher k Ullman 1987, Huttenlocher 1989], maxi- 
mal clique methods [e.g. Bolles k Cain 1982] and constrained tree search methods 
[e.g. Grimson k Lozano-Perez 1984, 1987, Gaston k Lozano-Perez 1984, Murray 
1987a, 1987b, Murray k Cook 1988, Drumheller 1987]. Formal analysis of the last 
class of methods [Grimson 1989] has shown that performance is very different when 
all of the sensory data are known to have come from a single object, as opposed to 
sensory data that includes spurious features. If all of the data are known to have 
come from a single object, the expected amount of search required to find a correct 
interpretation is on the order of 

0(m 2 + ams) 

where m is the number of model features, s is the number of data features, and a is 
a small constant. In most of the problems of interest, s > m so that the expected 
amount of search is quadratic in the parameters of the problem, and is linear in the 
number of data-model pairings. On the other hand, if spurious data is allowed, the 
expected amount of search is bounded above by an expression on the order of 

0(ms2 c + mV[l + e] c + 6m 6 + m[l + -y] s ), 

where again m is the number of model features, s is the number of sensor features, 
of which c correctly arise from the object, and e,7 < 1 are small constants, and it 
is bounded below by an expression on the order of 

o(m2 c + ms). 

Depending on the specific parameters of the problem, different terms of these ex- 
pressions will dominate, but in general, the expected search is now exponential in 
the problem size. 

This implies that one of the hard parts of the recognition problem is in sepa- 
rating out a correct subset of the data from the spurious data, where by correct we 
mean a subset of the data that arises from a single object. One means of attack- 
ing this problem is to use grouping mechanisms to preselect likely subspaces of the 
search space on which to focus. This can be done in a data driven fashion [Lowe 
1985, 1987, Jacobs 1987, 1988]. It can also be done in a model driven manner, for 
example, by using the generalized Hough transform [Ballard 1981]. In [Grimson 
k Huttenlocher 1988] we investigated the combinatorics of using such schemes. In 
particular, we showed that while such methods could reduce the size of the search 
space, they could not, in general, be used to select subspaces for which all of the 
sensory features came from a single object, without at the same time encurring a 
non-trivial false positive rate. 



A second approach is to use heuristic criteria to terminate the search process 
once an interpretation that is "good enough" has been found. In this paper we 
examine this alternative. The usual method used for terminating search [e.g. Ayache 
k Faugeras 1986, Grimson k Lozano-Perez 1987, Lowe 1985, 1987] is to measure 
the "goodness" of the interpretation, by determining what fraction of the object is 
accounted for by the interpretation, and to terminate the search when that measure 
exceeds some threshold. Typical measures include the number of model features 
included in the interpretation, or the amount of perimeter or surface area of the 
model included in the interpretation. In using such methods, there are two questions 
of interest. The first is establishing first principles methods for setting the threshold 
for termination. In [Grimson k Huttenlocher 1989] we address this question on a 
formal basis, showing that estimates for the threshold may be found as a function 
of the clutter of the scene and the size of the model, such that no false positive 
interpretations will be expected. In this paper, we turn to the second question, 
namely, to what extent does the use of premature termination of the search reduce 
the expected cost of the search process itself. 

1. The constrained search model. 

To determine the expected cost using premature termination, we first establish the 
search framework to be used in solving the recognition problem. We then review 
results from earlier analysis of the full constrained search method, before deriving 
new results on the use of premature termination. 

We begin by reviewing the constrained search method, used previously in [Grim- 
son k Lozano-Perez 1984, 1987, Gaston k Lozano-Perez 1984, Murray 1987a, 1987b, 
Murray k Cook 1988, Drumheller 1987] as a basis for recognizing and locating ob- 
jects. The approach seeks to match data features to model features in a manner that 
is consistent with some rigid transformation of the model into the sensory data. We 
assume that our models are represented by sets of geometric features, such as edges, 
distinctive points, surface patches, axes of cylinders, etc., and that the sensory data 
has been processed to obtain similar features. There are many methods for finding 
matches between such features, the approach taken here is to explore the space of 
possible correspondences by searching a tree of interpretations. 

This tree search can be defined as follows. Suppose we order the data features 
in some arbitrary fashion. We select the first data feature, and hypothesize in turn 
that it is in correspondence with each of the model features. We represent this set 
of alternatives as a set of nodes at the same level of a tree (see Figure 1). 

Given each one of these hypothesized assignments of data feature f\ to a model 
feature, Fj,j= 1, . . . , m, we turn to the second data feature. Again, we can consider 
all possible assignments of the second data feature fa to model features, relative to 
each of the assignments of the first data feature. This is shown in Figure 2. Note 
that the entire set of nodes in the second level of the tree corresponds to all possible 
matches for the first two data features. 
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Figure 1. We can build a tree of possible interpretations, by first considering all the ways 
of matching the first data feature, f\ , to each of the model features, Fj,j = l,...,m. 
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Figure 2. For each pairing of the first data feature with a model feature, we can consider 
matchings for the second data feature with each of the model features. Each node in the 
second level of the tree defines a pairing for the first two data points, found by tracing up 
the tree to the nodes. An example is shown. 



We can continue in this manner, adding new levels to the tree, one for each 
data feature. A node of the interpretation tree at level n describes a partial n- 
interpretation, in that the nodes lying directly between the current node and the 
root of the tree identify an assignment of model features to the first n data features. 
Any leaf of the tree defines a complete ^-interpretation, where s is the total number 
of data features. 

Our goal is to find consistent ^-interpretations, where k is as large as possible, 
k < s, and to find these interpretations with as little effort as possible. A simple- 
minded method would examine each leaf of the tree, testing to see if there exists a 



rigid transformation mapping each model feature into its associated data feature. 
This is clearly too expensive, as it simply reverts to an exploration of the entire, 
exponential-size, search space. A better solution is to explore the interpretation tree, 
starting at its root, and testing interpretations as we move downward in the tree. As 
soon as we find a node that is not consistent, i.e. for which no rigid transform will 
correctly align model and data feature, we terminate any further downward search 
below that node, as adding new data-model pairings to the interpretation defined 
at that node will not turn an inconsistent interpretation into a consistent one. 

In testing for consistency at a node, we have two different choices. We could 
explicitly solve for the best rigid transformation, and test that all of the model fea- 
tures do in fact get mapped into agreement with their corresponding data features. 
This approach has two drawbacks. First, computing such a transformation is gen- 
erally computationally expensive (however, see [Faugeras & Hebert 1986, Ayache 
Sz Faugeras 1986] for an efficient method for updating transformations), and we 
would like to avoid any unnecessary use of such a computation. Second, in order to 
compute such a transformation, we will need an interpretation of at least k data- 
model pairs, where k depends on the characteristics of the features. This means we 
must wait until we are at least k levels deep in the tree, before we can apply our 
consistency test, and this increases the amount of work that must be done. 

Our second choice is to look for less complete methods for testing consistency. 
We instead seek constraints that can be applied at any node of the interpretation 
tree, with the property that while no single constraint can uniquely guarantee the 
consistency of an interpretation, each constraint can rule out some interpretations. 
The hope is that if enough independent constraints can be combined together, their 
aggregation will prove powerful in determining consistency, but at a lower cost than 
fully solving for a transformation. 

In previous work, we developed a set of unary and binary constraints that can 
be applied to this problem [Grimson & Lozano-Perez 1984, 1987]. For example, if 
we are matching edge segments from a grey-level image, one unary constraint is that 
the length of the data edge must not be longer than the corresponding model edge, 
plus some bounded amount of error. Binary constraints apply to pairs of data-model 
pairings, for example, the angle between two data edges must be roughly the same 
as the angle between the corresponding model edges, and the range of distances 
between a pair of data edges must be contained within the corresponding range of 
distances for a pair of model edges, adjusted for error, and so on. Hence, if a unary 
constraint, applied to such a pairing, is true, then this implies that the data-model 
pairing may be part of a consistent interpretation. If it is false, however, then that 
pairing cannot possibly be part of such an interpretation. Binary constraints apply 
to pairs of data-model pairings, with the same logic. These kinds of constraints have 
the advantages of computational simplicity, while retaining considerable power to 
separate consistent from inconsistent interpretations, and of applicability at virtually 
any node in the interpretation tree. 

Formulated in this way, our approach to recognition can be considered as a 



problem of constraint satisfaction, or consistent labelling, a problem that has re- 
ceived considerable attention in the Artificial Intelligence literature [e.g. Freuder 
1978, 1982, Gaschnig 1979, Haralick & Elliot 1980, Haralick & Shapiro 1979, Mack- 
worth 1977, Mackworth & Freuder 1985, Montanari 1974, Nudel 1983, Waltz 1975]. 
When we analyze the performance of our system, we will use results from this liter- 
ature to guide our development. 

To use these constraints, we must now specify a means of exploring the interpre- 
tation tree. We do this using back-tracking depth-first search. (See Figure 3.) That 
is, we begin at the root of the tree, and explore downwards along the first branch. At 
each node, we check the unary constraints applicable to the new data-model pairing, 
and we check the n — 1 sets of binary constraints obtained by considering the new 
data-model pairing relative to each data-model pairing defined by an ancestor node. 
If all these constraints are consistent, then we continue downwards in the search. If 
one of them is inconsistent, we backtrack to the previous node. We then explore 
the next branch of that node. If there are no more branches, we backtrack another 
level, and so on. Note that the number of constraints increases as we go lower in the 
tree, and hence the likelihood that the interpretation is in fact globally consistent 
increases. 




Figure 3. The tree is searched in a depth-first, backtracking manner, starting at the root. 
If a node is found to be inconsistent, the downward search is terminated, and we backtrack. 
Any leaf of the tree that is reached by the search constitutes a hypothesized interpretation. 
The darker edges in the diagram indicate one example of a backtracking search. 

If we reach a leaf of the tree, we have a possible interpretation of the data relative 
to the model, which we can verify by solving for a rigid transformation and testing 
that it does take all of the model features into rough agreement with their associated 
data features. Even if we do reach a leaf of the tree, we do not abandon the search. 
Rather, we accumulate that possible interpretation, back-track and continue, until 
the entire tree has been explored, and all possible interpretations have been found. 



As described, our search method will succeed only when all of the data features 
come from the object of interest. In general, object recognition must also work in 
the presence of clutter in the scene, in which much of the object may be hidden from 
view, and in which much of the data is spurious, coming from other objects. The 
tree search method can be straightforwardly extended to handle this by introducing 
into our matching vocabulary a new model feature, called a null character feature. 
At each node of the interpretation tree, we add as a last resort an extra branch 
corresponding to this feature (see Figure 4). This feature (denoted by a * to dis- 
tinguish it from actual model features Fj) indicates that the data point to which it 
is matched is to be excluded from the interpretation, and treated as spurious data. 
To complete this addition to our matching scheme, we must define the consistency 
relationships between data-model pairings involving a null character match. Since 
the data point is to be excluded, it cannot affect the current interpretation, and 
hence any constraint involving a data point matched to the null character is deemed 
to be consistent. 




h 



Figure 4. The interpretation tree can be extended by adding the null character * as a final 
branch for each node of the tree. A match of a data feature and this character indicates 
that the data feature is not part of the current interpretation. In the example shown, the 
simple tree of Figure 2 has been extended to include the null character. 



2. Previous results 

This method has been used for recognition in a variety of domains [Grimson k. 
Lozano-Perez 1984, 1987, Gaston & Lozano-Perez 1984, Murray 1987a, 1987b, Mur- 
ray & Cook 1988, Drumheller 1987]. Our empirical experience was that the method 
was very efficient when all of the data features are known to have come from a single 
object. When spurious data is included, however, the method slows down by several 



orders of magnitude. If methods for preselecting subspaces of the search space, such 
as the generalized Hough transform [Ballard 1981], are added, the method improves 
in efficiency. By preselection, we mean that only some subset of the possible data- 
model pairings are used in the search process, and typically such subsets are chosen 
based on an expectation that they give rise to similar transformations of the model. 
If premature termination is added (i.e. halting the search process as soon as an 
interpretation that is "good enough" is found), the method improves even further. 
In an earlier combinatorial analysis [Grimson 1989], we showed that some of 
these empirical observations were supported by formal analysis. The main points of 
this analysis are summarized below, formal statements of the main propositions are 
included in the appendix for completeness. 

• When all of the data features are known to have come from a single object, the 
number of interpretations is asymptotic to 1 . 

• When only c of the s data features come from an object with m model features, 
the number of interpretations n* is bounded above by an expression of order 

0«) = 2 C + [1 + a]' + 2ms[l + p 2 ] c 
where p 2 is the probability of a pair of random data-model pairings satisfying 
binary consistency, and a is a small (< 1) constant that depends on the object 
characteristics and the amount of noise in the measurements. The number of 
interpretations is bounded below by an expression of order 

o(n* s ) = 2 C + [1 + (3} s + 2ms[l + p 2 ] c . 

• The expected probability of two random data-model pairings being consistent 
P2 is given by 

K 

P2= — 
\_m 

where k is a constant (usually less than 1) that can be derived from properties 

of the object and noise characteristics. The appendix provides details. 

• If all s sensory measurements are known to lie on a single object with m equal 
sized features, the sensory data is distributed uniformly, and if the noise is small 
enough, then the expected amount of search needed to find the interpretation 
is bounded by 

m 2 < N s < m 2 + ams 
where a is a constant that depends on the object characteristics and the amount 
of noise in the sensory measurements. 

• If c of the s sensory measurements lie on an object with m equal sized features, 
the sensory data is distributed uniformly, and if the noise is small enough, then 
the expected amount of search needed to find the interpretations, is bounded 
by above by an expression of order 

0(JV s *) = m [l + 7 ] s + ms2 Co + 6m 6 + roV[l + /i] c ° 

and is bounded below by an expression of order 

o(N*) = m2 C0 + ms 



i2 



where j,6 are constants that depend on the object characteristics and the 
amount of sensor noise, 7 < 1. 

As we suggested in the introduction, these results show that constrained search is 
polynomial, in fact quadratic, when all of the data is known to come from a single 
object, but is exponential when spurious data is included. One way of reducing this 
exponential cost is to terminate the search as soon as an interpretation is found 
that is "good enough". In this paper, we consider the effects of this heuristic on the 
search process. 

3. Setting up the termination model. 

We define premature termination to be the process of stopping the search when an 
interpretation is found that is "good enough". We define our measure of goodness 
to be the number of data features included in an interpretation that are matched 
to a real model feature, and not the null character. Other definitions are possible, 
such as the fraction of an object's perimeter that is accounted for by the data, but 
for our purposes the simple counting of features suffices. Thus, we set a threshold 
on the size of an interpretation, and we will terminate the search as soon as we 
find a valid interpretation of that size. In [Grimson and Huttenlocher 1989], we 
consider the problem of how to properly select such a threshold so that there are no 
expected false positives. Here, we simply assume that any interpretation exceeding 
the threshold is a correct one. 

To see how termination can reduce the search process, consider a simple exam- 
ple. Suppose we have a scene with 5 = 6 features, a model with m = 2 features, and 
a threshold of t = 3. In principle, the constrained search method would examine a 
tree of depth 6, the k th level of which would have (m + l) fc nodes to be examined, 

for a total of 

(m+l) s+1 - 1 

m 
different nodes. Of course, many of these nodes would not be examined because 
ancestor nodes in the tree would be inconsistent with the constraints, and the 
subsequent subtree could be pruned. Nonetheless, consider what happens when 
a threshold on search is included. 

For simplicity, we consider the subtree below a node at the first level of the 
tree. In this case, there are in principle 

< 2+1 ) W - 1 = 364 
2 

nodes to be explored in this subtree. In Figure 5, we show the subtree under a 

node on the first level of the tree that would be searched when a threshold on 

interpretation length is used to prune the tree. Notice that once we reach a node 

with t assignments of data features to actual model features (i.e. not to the null 

character), we can terminate further downward search. Similarly, once we reach a 

node for which it is not possible to obtain a t interpretation, no matter what happens 



to the remaining data features, we can again terminate further downward search. 
In the case shown in the figure, only 64 nodes need to be explored, almost an order 
of magnitude decrease in effort. As in the normal case, some of the nodes will not 
be reached due to inconsistencies in the constraints, but we can clearly see that in 
principle the number of nodes to be explored is reduced from the straightforward 



case. 




Figure 5. The portions of the interpretation tree under the first node that need to be 
explored using search termination. In this case, we have set m = 2, s = 6 and t = 3. The 
circles indicate nodes actually explored. 



4. The formal model 

We will derive results on the effects of prematurely terminating the search process in 
several steps. We begin by defining a formal model for the probability of consistency 
of a node in the tree. Given that model, we derive an explicit expression for the 
expected number of nodes searched in a tree. We then bound this expression, and 
use these bounds to derive simpler order of growth bounds on the expected search. 
These are summarized in the corollaries to Proposition 3, in which we show that the 
expected search is cubic in the parameters of the problem. 

We begin with the formal model for consistency. Since our method uses both 
unary and binary constraints, we need to model the probability that a data-model 
assignment is consistent and the probability that a pair of data-model assignments 
are consistent. 

Similar to our earlier analysis [Grimson 1989], we let qi t i denote the probability 
that assigning the i th data element to the I th model element is consistent, and 
we let qi,j-,i,j denote the probability that the pair of assignments i t-+ I,j i-»- J is 
consistent. Our model of the recognition problem is defined as follows. 

For a single data-model pairing, if the pairing is part of the correct interpreta- 
tion, the probability of consistency is simply 1. Similarly, any pairing involving the 
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null character is consistent with probability 1. If the pairing is not correct, we let 
the probability of consistency be p\. Thus, we have 

{1 if i !->■ / is correct 
1 if / is the null character, 
px otherwise. 
For a pair of assignments, suppose we are considering a match in which data 
fragments i,j are paired with model fragments I, J respectively. We will model the 
situation by saying that the consistency of this pair of pairs has probability 1 if 
these pairings are part of the correct interpretation, or if either of then is assigned 
to the null character. Otherwise we will assume that the probability of consistency 
is P2 • Note that this is essentially assuming a random distribution of edges. It is 
also assuming that pairs of model edges are distinctive, so that objects with partial 
symmetries are excluded. Thus, we have 

{1 if i i-* J, j i-» J is correct 
1 if either I or J are the null character, 
P2 otherwise. 
Given a partial interpretation at a node, the probability of consistency is given 

by 

Jl Qi,l Y[ 9i,j;l,J- 

i i^j 

We can use the above definitions for q to derive an explicit expression for the ex- 
pected number of nodes in the tree. 

Proposition 1: Assume that the data features that actually arise from the 
object of interest are uniformly interspersed among the spurious features, occuring 
with frequency 

s= c -. 

s 
Assume we are given a partial interpretation based on I — 1 data features, of which u 

are correctly assigned, the remaining £ — u — 1 being matched to the null character. 

If we assign the next data feature to a real, but incorrect, model feature, then 

the number of nodes below this point in the tree that will on average be explored, 

denoted by W(s,u,£), is given by 

i— n .— n V / 

Y, ( • W+vi' +1)-L5(t+1)J 24 2 H 2 } 

t-s+e+k-u-i} ^ ' 

i=max{0,t-s+£+k-u} ^ ^ 
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A proof of this proposition is deferred to the Appendix. 

We can check the correctness of this expression by setting p\ and P2 to 1. 
Applying the resulting expression to the case of u — 0,1 = l,m = 2,5 = 6,2 = 3, as 
shown in Figure 5, yields the correct result of 64 nodes. 

We can use Proposition 1 to establish bounds on the search of such a subtree. 
This is done in the following proposition: 

Proposition 2: The expression of Proposition 1 can be bounded by: 

w(s,uj) < PlP i \{t - u) jo(u)+1 y *")- 1 

+ ( 5 _ t _ t + u )(t - u - 1) \{s - £ - 2)/i] io(u>s -*- 1) X 

_ < 2 +<(2u-l) 

x ( 1 + mp\- %" 



and by 

W(s, u,£) > p!pl u [s-t + u-t + 2] 



2u-S(2u-3)-6 2 +2 



where 



1_5 f(u) 

fi = mp\ 6 p J 2 K ' 

., , 2«(1 - S) + 2 + 6 - 6 2 
/(«) = j 

2u(l-<)+4+<-3< 2 

i/ = mp}~V 2 2 



The previous claim gives us a lower bound on the expected search of a particular 
subtree. How do we use it to bound the search of the whole tree? Under the 
assumption that the c correct data features are uniformly interspersed throughout 
the full set of s data features, we can see that at the top level of the tree, we must 
search m subtrees, with u = 0,£ = 1, that is, for each possible assignment of the first 
data feature to a real model feature, we must explore the appropriate subtree. Since 
the first data feature is not part of the true object, once we have exhausted these 
subtrees, we must move on to interpretations that exclude the first data point, by 
considering the portion of the tree below the node that pairs the first data point to 
the null character. Under this node, we consider pairings of the second data feature. 
Again, we must consider m subtrees, with u = 0,^ = 2. We continue this process 
until we reach level £ = j. In this case, we have a data feature that does have a 
correct match, and on average this will be found after we have searched y subtrees 
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at this level. We then repeat this process below this node in the tree, with u = 1, 
and so on. Hence, the expected total amount of search is given by: 




* 1 



i=i 



+ ^W(s,j,- 6 (j+l)) + l 



To obtain bounds on this expression, we simply need to substitute from Proposition 
2, and simplify. 



Proposition 3: Given a uniform distribution of correct data features among 
the spurious, and given the previously derived expression for the binary probability 
of consistency: 

»-{*■)'. 

the total amount of search expected is bounded by 



W(s)<t] + ^ 



*i"+V*- 1 (i + (*-i)^) 






yo[( d _i)( t _i) + c ] 

y 2 : 



l-pt s) 



m 



P2JI-P2) 

.(1-P2) 2 1-P2 



+/9 



i3-S) {1 _ p (S- S )t ) tp( 3-S)t 



(1-P2- 5 ) 2 



i-pI~ s 



where 



7 = ( s - 3)/* 

to = [(* - 3)/i - lj 

Jo = [« 2 - lj 

c =*-'-K* +i ) 



and by 



W'(s) > 7 + mpt 




"'J' 1 - 



j -pi' Mv-tf) , btpi- 

1 „2 '- o\o+ 



Pi 



(i-pjy i-pi 



+a a \ rt -6 P2 



„(3-«)*> 



i-^- 5) 



(i-pr o;t ) + ^ 



,(3-*)« 



(!-#-*>) 



l-ri 3 - 5) . 
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where 



a = (s 






b >6 



1-6 ^4^ 
a = mp 1 p 2 



I 

The key results follow from this proposition. 



Corollary 3.1: The order of magnitude of the expected search is given by: 

o(W(s)) — ms- 
and by 

0(W(s)) = amts-[l + — \ (k 2 — ) 
c \ mj \ m/ 

where a is a small constant. 

Proof: For both bounds, we can simply identify the dominant term, substitute 
in the bounds on the variables, yielding 



o(W(s)) = m(s-t + 2 -■?-)- (19) 

V 2c/ c 

0m s)) = mt( S -t----y c (l + -) (J-) 



The simpler expressions follow. | 



Corollary 3.2: If the scene clutter and the noise in the data is such that 

m k 2 
then premature termination has an expected search that is of order 

0(W(s)) = amts- 
c 

and 

o(W(s)) = ms-. 
c 

Proof: If the conditions hold, then the exponent in the previous corollary becomes 

and the upper bound on the search reduces to 

°W» = "*(-'-5-s)K 1 + £" 

The simplification follows. | 
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5. Implications of the results 

Several interesting conclusions may be drawn from the above analysis. First, we note 

from Corollary 3.1, that we cannot guarantee that terminated search will result in 

a polynomial time algorithm, as the upper bound is still exponential. At the same 

time, however, the search is clearly reduced, as both the base and the exponent are 

much reduced. 

Corollary 3.2 provides a fascinating result, however, since it indicates conditions 

on the problem under which the expected search does become polynomial, basically 

implying that if the scene clutter is small enough, a polynomial algorithm results. 

This has two interesting implications. If the number of scene features is small enough 

relative to the size of the model, it implies that terminated search will perform well. 

When the scene clutter increases, however, we must provide some form of grouping 

or selection to reduce the number of scene features actually considered in the search 

below 

2m 
s < —■ 

Here, selection means isolating a subset of the data features most of which are 
believed to have come from a single instance of a known object. 

This nicely extends our earlier results on the role of selection in efficient object 
recognition. The results of [Grimson 1989] imply that for pure constrained search, 
knowing that all of the data features are from a given object will reduce the expected 
search to the polynomial domain, but general constrained search remains exponen- 
tial. This suggests that our selection mechanism must be very accurate at selecting 
out subsets of the data features for consideration, since if even one spurious point 
is included we must either use an exponential search method, or tolerate having the 
entire subset of data features being rejected. When premature search termination is 
added, however, Corollary 3.2 implies that we can tolerate considerably more uncer- 
tainty on the part of the selection process and still have an efficient search method. 
We simply require that the selection method allows an amount of spurious data that 
is bounded by the conditions of Corollary 3.2. 

Also note that both Proposition 3, and its Corollaries, involve the constant 
k, which is determined by properties of the object model and the sensing system. 
In particular, k increases with increasing noise in the sensory data, and this, as 
expected, implies both that the amount of expected search will increase, and that 
the amount of spurious data that can be tolerated, while maintaining a polynomial 
algorithm, decreases. Standard values for k are on the order of 

P 

K«.2- 

where P is the total perimeter of the object (for the case of 2D objects) and D is 

the dimension of the image. Given this, we see that our conditions for a polynomial 

search are that 

. 2m (D " 

s < -f « 50m f- 
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so that if the object is of a size on the order of the image (P « D), considerable 
amounts of spurious data are still tolerable, while maintaining a polynomial search 
algorithm. 

5.1 Comparing search results 

To more directly compare the results derived here, we can consider some earlier 
analysis of constrained search in object recognition. In [Grimson, 1989], we analyzed 
the combinatorial behavior of the constrained search approach, and show two major 
results. The first is that if all of the data are known to have come from a single 
object, so that we need not use the null character to exclude spurious data, then 
the amount of search was bounded by 

m 2 < W no _ occ < Pi™? + (1+ P\k) ms. 
Hence the search process is polynomial in this case. 

If, however, spurious data is included, we showed that the search is exponential. 
In our earlier analysis, we did not use any assumptions on the distribution of the 
correct sensory data features in the search process. To more directly compare the 
two methods, below we derive bounds on the constrained search process under the 
assumption of uniform distribution of the correct data features. 

Proposition 4: If the sensory data arising from a correct interpretation are 
uniformly distributed among the spurious data, then the amount of search expended 
by the normal constrained search method is bounded by 



m-2 c < W occ < m-2 c + - [1 + e]* 

C C € 



1+ * 



1+6 



-1 1 

m°s s r , lC 

+ — - 1 + P2 
K l C 



With these results, it is clear that premature termination of the search process 
can significantly reduce the work involved in locating an object. From Corollary 
3.2, we know that if the scene clutter is small enough, the expected search reduces 
to order 

e c 

mS- < Wt erm < mtS-. 
C C 

This is clearly signficantly smaller than the expressions in Proposition 4. 

As a consequence, the main conclusion we can draw is that premature termi- 
nation of a constrained search method can dramatically reduce the expected search 
required to recognize and locate objects in cluttered noisy data. To obtain poly- 
nomial time algorithms for recognition, we must keep the ratio of scene clutter 
to object size below a well defined bound, and this implies that for significantly 
cluttered scenes, some type of grouping or selection mechanism is needed to select 
out subsets of the data features that are likely to include a subset arising from an 
instance of a known object. 
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Appendix 

In this appendix, we present formal proofs of the propositions stated in the main 
text. 

We begin by stating the main propositions from earlier analysis in Grimson 
[1989]. (Note that the numbers of the propositions refer to the numbers used in 
that article.) In that analysis, we first derived bounds on the number of consistent 
interpretations, both in the case of data known to have come from a single object, 
and in the case of spurious data. 

Proposition 1 [Grimson, 1989]: If all of the k sensory measurements are 
known to lie on a single object with m features, then the number of interpretations 
nk is bounded by 

«fc < [1 + (m - l)pip 2 2 J . 
and by 

k k(k-i) i2 



n k > 1 + 



P2 2 -P2 2 



P2 +Pi(m- 1) 

where p\ is the probability of a random data-model assignment satisfying unary 
consistency, and p 2 is the probability of a pair of random data-model assignments 
satisfying binary consistency.! 

Proposition 2 [Grimson, 1989] : Given an object with m faces and given 
k sensory data points, of which c actually lie on the object, the number of interpre- 
tations n£ is bounded by 

n* k < 2 C - [1 + p 2 ] °+[l + mp lP l] k - c [p 2 + l + mp lP l] c 

+ m Pl [1 - p£p] [1 + p 2 ] c_1 [k + p 2 (k - c)] 
and by 

nt>2 c -[l + p^] c +[l + (m-l)p 1 p 2 ^ X ] k - c [l + (m-l)p 1 pp 1 +p^] c 
+ Pl (m - 1) [1 + p 2 ] c_1 [k +p 2 (k- c)] 

- Pl (m - l)p^ [1 + p^} c " 1 [* + P7^{k - c)} 

where p\ is the probability of a random data-model assignment satisfying unary 
consistency, and p 2 is the probability of a pair of random data-model assignments 
satisfying binary consistency.! 
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To obtain order of magnitude expressions on the amount of search required to 
find these interpretations, we need to relate the probability of consistency to aspects 
of the problem. We established that the probability of consistency is inversely 
proportional to the number of model features, for a fixed amount of sensor noise 
and a fixed size object model: 

Proposition 3 [Grimson, 1989]: Given a two dimensional object with m 
equal sized edges of length L, and given sensory data that is distributed uniformly 
in transform space with a uniform distribution of lengths, the expected probability 
of two random data-model pairings being consistent, P2, is given by 

K 



n = 



m 



where 



r\> — iSiii) — 



in the worst case, and 



K — Kit — 



lie* 


w(e* p y + 2e;(l - h*) 


l + SinCa (l h*f 

7T 


P 


h 


w(e;f + e;(l-h*) 


+ ^d-*-)» 


P' 



in the uniform distribution case, and where e a is a bound on the error in measuring 
orientation, e p is a bound on the error in measuring position, h is the minimum 
length data edge, e* = ^, h* = ■£, P is the perimeter of the object, and D is the 
dimension (width) of the image. | 

To illustrate the range of values for this constant, in Figure 6, we list the values 
for k u for a range of values of e* and a range of values of PjD.f We fix h* — 2e* 
and e a = tan -1 2e*. As expected, the constant increases with increasing noise, and 
as the size of the object increases. 



P/D = 


.125 


.25 


.5 


1 


2 


4 


8 


e* p = .01 


.002 


.004 


.008 


.016 


.033 


.065 


.131 


«; = ■! 


.021 


.042 


.085 


.169 


.338 


.677 


1.354 


«; = -5 


.111 


.222 


.443 


.886 


1.772 


3.545 


7.090 



Figure 6. Values for the constant k u for a range of values of €p and a range of values of 
PI D.I We fix h* = 2e* p and e a = tan -1 2e*. 

A similar result holds for three dimensional objects. This result can be used to 
establish the following two sets of bounds on the amount of search involved. 



Proposition 6 [Grimson, 1989]: If all of the s sensory measurements are 
known to lie on a single two-dimensional object with m equal sized edges of length L, 
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to > 3, the sensory data is distributed uniformly in transform space, with a uniform 
length distribution, and if the noise is small enough, then the expected amount of 
search needed to find the interpretation is bounded by 

to 2 < N s < m 2 + ams 

where a is a constant that depends on the object characteristics and the amount of 
noise in the sensory measurements.! 

Proposition 9 [Grimson, 1989]: If cq of the k sensory measurements lie on 
a two-dimensional object with m equal sized edges of length L, the sensory data is 
distributed uniformly in transform space, with a uniform length distribution, and if 
the noise is small enough, then the expected amount of search needed to find the 
interpretations, for m large, is bounded by 

'[1 + P1K) S 



N* <m 



p X K 



+ 2 Co [s-co + l] 



+ Pim 



-2 + [1 + a] 



CO 



a 



[0 - CO 



+ 



CO 



a(l + a) 



n; >m 



2 C °+ 1 + s - c - 3 



where 



a = 



m* 



and where k is a constant the depends on the object characteristics and the amount 
of sensor noise, and p\ is the probability of a random data-model assignment satis- 
fying unary consistency.! 



Given these results as a basis, the text of the paper presents a similar analysis for 
the case of premature termination. The main results, with proofs, are summarized 
below. 



Proposition 1: Assume that the data features that actually arise from the 
object of interest are uniformly interspersed among the spurious features, occuring 
with frequency 

s 
Assume we are given a partial interpretation based on t— 1 data features, of which u 
are correctly assigned, the remaining I — u — 1 being matched to the null character. 
If we assign the next data feature to a real, but incorrect, model feature, then 
the number of nodes below this point in the tree that will on average be explored, 
denoted by W(s,u,£), is given by 



Pi 



;— u— 1 k /, \ 
jfc=0 t=0 v ' 






19 



s-e-i 



t-u-2 



{o,t-s+e+k-u-i} v ' 

v^ 2 ( h ~ x \ i <-l«j C' + " a +1 )-(" + i" J 



+0-W+DJ ( i+ " 2 +2 )-(" +L4( 2 i+1)J ) 



J P 2 



k=t—u \«=max 
t-u-2 



) 



»=max{0, 



(1) 



Proof: We can see how this sum arises by the following argument. First, the 
probability that this assignment satisfies the unary constraints is given by p\ which 
multiplies the remaining summations. Since we already have a ^-interpretation in 
hand, and since we are assigning the next data feature a non-null character, we must 
explore the next t — u — 1 levels in detail. Hence the first sum in the expression 
counts the number of nodes in this case. The summation over k counts the number 
of nodes at each succeeding level, and the summation over i counts the nodes at a 
particular level, by considering the number of features assigned a non null character. 
For i such features, there are (*) different ways of selecting them, and for each one, 
there are m possible assignments. To determine the consistency, we multiply by the 
probability of applicable unary and binary constraints holding true. Note that the 
exponent for the unary constraint probability counts those data-model assignments 
that are not correct. The exponent for the binary constraint probability counts the 
total number of possible pairs of data-model assignments, minus those involving 
only correct assignments. 

Once we have reached the level in the tree at which the first possible t interpre- 
tations may occur, our search narrows. In particular, we need not consider exploring 
portions of the tree for which interpretations of size larger than t are involved, and 
we need not consider exploring portions of the tree for which interpretations of size 
t are impossible. The remaining two summations count these cases, the first one 
counting those cases in which the most recently assigned data feature has been given 
a non null character, and the second one counting those cases for which the most 
recently assigned data feature has been matched to the null character. | 



Proposition 2: The expression of Proposition 1 can be bounded by: 

W(s,u,£) < PlP % \(t - «)io(»)+i At io(«)-i 

+ ( s - £ - t + u)(t - u -l)[(s-l- 2)n] ioiu '*- e - 1) X 

/ !_ 5 2«.+ l- ' a +«ff- 1 > Y 

X ( l + mp\ d p 2 ) ■ 

and by 

W(s,u,£) > pxp\ u [s-t + u-£ + 2] l-(t- u)mp\- s pf ^ 



(10) 



(16) 
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where 

H = mp 1 p J 2 K 

.. . 2u(l-6) + 2 + 6-S 2 
/(«) = 2 

i/ = m^ p 2 2 

Proof: To use equation (1), we want to obtain closed form bounds on the sums. 
We begin with an upper bound. 

Consider the first sum in equation (1). First, we can use 

Si - 1 < [6i\ < Si 

to remove the dependence on [-J • Second, since p 2 < 1 5 we can get an upper bound 
on the expression by replacing the resulting exponent for pi with a linear expression 
in i that is smaller than the current exponent, in particular by replacing terms in 
i 2 by similar terms in i. This leads to the upper bound for the first summation of 

t-u-l 

where 

1_{ flu) 

.. 2tt(l - 6) + 2 + S - 6 2 

/(«) = 5 ' 

We can simplify this by using the geometric progression, to yield: 

This is still an exponential, albeit a small one. We can reduce this further, by 
observing that 

[1+/.]— = yY ,-),.' 0) 



" = EC7V 

i=o \ J ' 



and asking when the largest term occurs. 
In general 



S(7> 



3=0 

will reach a maximum for the smallest j such that the j th term is larger than the 
j + 1 st term. This implies 

j + 1 > (m - j)e 
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or equivalently that the index for the largest term is 

> = lTfrJ- (4) 

In our particular case, we let 

_ | (t - tpji - 1 1 
*" L l + „ J - 
An examination of /z under the limits on 6 and m shows that 

m Pip2 < M ^ "*P2- 
In [Grimson 1989], we showed that if the data features are randomly distributed, 
then the probability of binary consistency is given by 

» - (") 2 

where k is a constant that depends on the characteristics of the object model and 
the amount of sensor noise. Substituting, we see that 

K 2 

<u< — . 
m 

Hence 

j < |« 2 -ij. 

Using equation (3), we have 

[1 + M]'-" <l + (*-«) ('"")*.* 

< 1 + (< - u) io+ V° 
and substitution into equation (2) implies that an upper bound on the first summa- 
tion in equation (1) is given by 

p£(t -«)*>(")+ V o(tt)-1 . (5) 

Now consider the second summation in equation (1). This is bounded above by 

s-e-i t-u-2 



«+i)-L«(«+i)j„l 2 )-(. 2 ; /g\ 



e e (Yy + v/ +i) -^ +i)j ^ 

k=t-u i=0 ^ ' 

We can use the same method as above, replacing the exponent for pi with a smaller 
exponent linear in i, which yields an upper bound on expression (6) of 

fc=t-u »=0 ^ ' 

where 

, , 2u(l-6) + 4 + S-3S 2 

S(») = 2 • 

If we let 

v = mp\~ s p% 
then a similar analysis to the first case indicates that the largest term occurs for 
index 

i (u,k)=[ y i+ ^ J. 



22 

This allows us to reduce the bound for equation (6) further, to 

£ mp\- S P T + a (< - « - 1) [(* - l)^] ,0( "' fc) • (7) 

fc=t— u 

Now the maximum value for v is the same as the maximum for n, namely 

i/ < — . 
m 

Hence, kv can be on the order of ^-k 2 which in general will be larger than 1. This 

implies that the largest term in the summation in equation (7) will occur for i'o as 

large as possible, and this leads to the following bound for the second summation 

in equation (1): 

mp\- S pT +1 ~ S ^"'^(s -i-t + u)(t -u-l)[(s-£- 2)i/]'' o(u '— ' _1) . (8) 

A similar analysis of the third summation yields a bound of 

(s - I - t + u)(t -u- \)pl [{s-t- 2)n] il(u ' s - i - 1) , (9) 

where 

. . . | (A; - 1)// - 1 , 
'i(",*)=t l + /t J. 

By piecing together equations (5), (8) and (9), and by noting that v < fi, we have 
as an upper bound: 

W(s,u,£) < PlP % \(t - u yo(u)+i ^(u)-! 



+ ( s -e-t + u)(t -u-l)[(s-£- 2)v] io(u ' s - i - 1) 



X 



x f 1 + mp\ b p 2 2 1 



(10) 



Now consider a lower bound on equation (1). Consider the first sum: 

'fEf^^F'-"' 1 (id 

fc=0 j=0 ^ ' 

To reduce this expression, we need to replace the exponent for pi with a larger 
expression linear in i. Replacing [Si\ with Si — 1, and replacing quadratic terms in 
i with linear ones, we get a lower bound on equation (11) of 

' Y, E (!) ^pJ^pPp^ 1 (12) 

fc=0 t=0 ^ ' 

where 

_ k(l - 6 2 ) + 2u + 1 + 26(1 - u) + S 
nyu,k) — . 

By Vandermondes Binomial Theorem, this reduces to 

^"'"fjl + mpl 1 -^]'. 

fc=0 
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Since we are seeking a lower bound on this expression, we note that the term in the 
summation is always greater than 1, and we obtain as a lower bound for equation 

(12): 

Pip 2 2 u (t-u). (13) 

Now, consider the second summation in equation (1) 

*~e z~ 2 (*-V + vs <+,, - ls<i+,, M'" + * ) - r ''" ,,J) . 

k=t-ui=m&x{0,t-s+e+k-u-l} ^ ' 

We can bound this below by only considering terms for which i runs from to 
t-u-2: 

s+u+l-t-i 
fc=t— u 



e {" - i )m'«rf + "- ij, ' +i,j P r" +j) - ( ' +i, " ,> ' 



As in the previous case, we can obtain a new lower bound, by replacing the exponent 
for p 2 with a larger expression that is linear in i. Using the same method as above, 
this leads to a lower bound on the second summation of 



2 _ s 3o _!I2a^i±£ 



mp 1 p 2 



(s-2t + 2u-£ + 2). 



(14) 



Similarly, the third summation can be bounded below by the same methods by 

Pipl u (s-2t + 2u-t + 2). (15) 

By piecing together equations (13), (14) and (15), we obtain 

W(s, u, I) > pip| u_1 [s - t + u - I + 2] 



2u-<(2u-3)-<" ! +2 

1 - (t - u)mp\- 6 p 2 



(16) 



Proposition 3: Given a uniform distribution of correct data features among 
the spurious, and given the previously derived expression for the binary probability 
of consistency: 

n = (^Y , 

the total amount of search expected is bounded by 



"MS'J + T 



V 0-1 ( 



i JO+ V 0-1 ( 1 + (t - 1) 



m' 



'V 1 -^ l~P { 2 \ 



-yo [{d _ 1)(t _ 1) + c] 



( 



+fi 



„(3-«) 



(3-5)ts 



P\ -°'^-P\ ') 
{l-p\- S f 



P2{l-P\) 
{1-P2? 

tpt 6)t 1 


1 

)] 


tp\ 

-P2. 



(18) 
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where 



and by 



W(s) > 7 + m Pl 
o 



7 = (5 - 3)n 

io = [(s - 3)fi - lj 

jo = [« 2 - lj 

/? = mp 1 p~ "* 

e =-*-IG +i ) 



PV ^(l-PJ*) , &*p2« 



(3 - 5) (l-p< 3 - 5)< ) 



+a g 1 ft, . - b P2 



l~pt S) 



(i-p?-y 



+ 



bt P ?- s » ' 
i- P f- s \ 



.(17) 



where 



•-(•-«G-!H(H 



Si-^+2 



Proof: The previous claim gives us a lower bound on the expected search of 
a particular subtree. How do we use it to bound the search of the whole tree? 
Under the assumption that the c correct data features are uniformly interspersed 
throughout the full set of s data features, we can see that at the top level of the tree, 
we must search m subtrees, with u = 0,£ — 1, that is, for each possible assignment 
of the first data feature to a real model feature, we must explore the appropriate 
subtree. Since the first data feature is not part of the true object, once we have 
exhausted these subtrees, we must move on to interpretations that exclude the first 
data point, by considering the portion of the tree below the node that pairs the first 
data point to the null character. Under this node, we consider pairings of the second 
data feature. Again, we must consider m subtrees, with u = 0,£ = 2. We continue 
this process until we reach level I — |. In this case, we have a data feature that 
does have a correct match, and on average this will be found after we have searched 
y subtrees at this level. We then repeat this process below this node in the tree, 
with u = 1, and so on. Hence, the expected total amount of search is given by: 

t-i /T*- 1 1 1 , 
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To derive a lower bound on this expression, we can substitute from equation (16), 
and execute the summation over i to obtain: 

W(s) >t(j)+ mm £ £ p\i [a - bj] [l + a(t - jrf - ^" 



j=0 8=1 



where 



-<-«*«(i-iKG-0 



a = mp\- i p 2 2 . 

To further simplify this expression, we can use the identity for the geometric 
progression: 



t-i 



9 ~9 






and a derivative of this to yield: 



t-i 



to get 



W(s) > 7 + m Pl 
o 



V ia i 9(1 ~ 9') Jl_ 

k q - (i- q y ~i- q 



\ \-vV _ j pI^-pV) + bt pV 



i-Pi (i-pI) 2 i-pI 
V i-^ 3 - 5) (i-^ 3 " 6) ) 2 i-^ 3_) . 



.(17) 



We can also derive an upper bound on the total search involved, by considering: 

*-l 7 1 

w ^ ^ E E mW (^'> ji +o+i 



j=o j=i 



*-i t 



1 1 

= < 7 +m EE w ( 5 ^''7i+o- 



j=0 t=l 

We can substitute from equation (10) and reduce the summation over i by bounding 
terms from above to yield 

1 mpi 



W (s) < t i + ^p- Y,pi [c* - i^^+v^^- 1 

x(i+M 2 - w )[(*-^'- 3 ) 



[*-j-i]x 

»o(j,s-7J-2)' 

A* 
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where 

To reduce this, we note from our previous analysis that 

jo(j) < [« 2 - lj 
and similarly 

io{j,s--j-2)< [(a - 3)/* - lj . 



This yields 



"m*4 + t 



t-i 



X)(*-i) i0+ V i0 - 1 ^ 



j=0 



t-1 



+ y,pI (i + M 2 - 5)j ) (c - 4/) (t-j- i) 7 io 



i=o 



where 



7 = (s - 3)fi 

«-(--KH) 

to = [(5 - 3)/i - lj 
io= [« 2 -lJ- 

We can reduce the remaining summations by expanding out the first term, then 
bounding each remaining term in the summation by the largest term, which in this 
case is the second term. Using the results from above on the geometric progression 
and its derivatives, this leads to 



wMtl+az 



«*+V"- 1 (i + (« _ i)^) 



(3-S)t' 



+ ^<-» ffff+'T^iu 



1-P2 

yo [( ,_ 1)( ,_ 1) + c] QMLz^)__M 



P2 



+P 



(i-rf-7 



?S\2 



1 „ 3 ~ s 



(18) 



Proposition 4: If the sensory data arising from a correct interpretation are 
uniformly distributed among the spurious data, then the amount of search expended 
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by the normal constrained search method is bounded by 

1C-1 



c o «77 

m-2 c < W occ < m-2 c + - [1 + e] s 
c c e 



1 + 



P2 



1 + e 



m 3 s s 



+ —r- [i + P2] c . 



K 2 C 



Proof: 

In [Grimson 1989] we showed that the number of nodes at the k th level of the 
tree is bounded by 



2 c(fc) <n k < 2 c(fe) + 

+ mpi 



i -I k— c(k) ■ 

1 + rapipj 



-,«<*) 



<=<*j+ 



^ 



1 + P2 + »™PlPf 

[l + P2] c(k) - 1 [k + P 2(k-c(k))] 



1-P2 

where c(A;) is the number of data features actually part of the correct interpretation. 
Using our earlier assumption that 

c(k) = Sk 

we can estimate bounds on the amount of search in the normal case by considering 



s-l 



s-l 



m 



fc=i 



fc=i 



1 -1 k—Sk 



E nk - m E 2&k + 1 + m PiP2 1 + P2 + rnpxpl 



Sk 



+ mp\ 



I-P2 



[1 + P2]"- 1 [* + !*(* -**)]• 



Consider the first term: 



s-l 



m 



E*" 



k=\ 

Actually, if we are careful in our considerations, this sum is really 



fc=i 



and this reduces to 



1 c_1 



k=.\ 



Similarly, the second term is 

mY / [l + e] k -^[l + P2 + ef k i 



k=l 



where 

This reduces to 



J 



e = mpip| = npi 



^fc+j 



1 + 



P2 



1 + 6 



™E En+4*' 

and by using the same trick of finding the maximal term in a sum, this reduces to 

c-1 



Z[l + eY 



1 + 



P2 

1 + e 
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A similar argument can be applied to the remaining term in the summation, yielding 

— — - [I + P2] ■ 

P2 C 

Combining all three of these bounds together, we have 



W occ < m-2 c + — [1 + e]° 



1+ n 



1 + e 



c—\ 3 

m s s r „ 1C 

K Z C 
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